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Abstract Do the amino acid sequence identities of residues that make contact across protein 
interfaces covary during evolution? If so, such covariance could be used to predict contacts across 
interfaces and assemble models of biological complexes. We find that residue pairs identified using 
a pseudo-likelihood-based method to covary across protein-protein interfaces in the 50S ribosomal 
unit and 28 additional bacterial protein complexes with known structure are almost always in contact 
in the complex, provided that the number of aligned sequences is greater than the average length 
of the two proteins. We use this method to make subunit contact predictions for an additional 
36 protein complexes with unknown structures, and present models based on these predictions 
for the tripartite ATP-independent periplasmic (TRAP) transporter, the tripartite efflux system, the 
pyruvate formate lyase-activating enzyme complex, and the methionine ABC transporter. 
DOI: 10.7554/eLife.02030.001 



Introduction 

Recent work has demonstrated the accuracy of coevolution-based contact prediction for monomeric 
proteins using a global statistical model {Thomas et al., 2008) to distinguish between direct and indi- 
rect couplings (Marks eta/., 2011; Morcos eta/., 2011; Hopfetal., 2012; Nugent and Jones, 2012; 
Jones et al., 2012; Lapedes et al., 2012; Marks et al., 2012; Sulkowska et al., 2012; Kamisetty 
et al., 2013). While early approaches relied on estimating an inverse covariance matrix {Marks et al., 
2011; Morcos et al., 2011; Jones et al., 2012), more recent studies have shown that a pseudo- 
likelihood-based approach {Balakrishnan et al., 2011) results in more accurate predictions {Ekeberg 
et al., 2013; Kamisetty et al., 2013) for a range of alignment sizes and protein lengths. 

In contrast to this rich body of work for monomeric proteins, relatively little is known about the 
utility of such statistical models in predicting protein-protein interactions. The more general problem 
of predicting if two proteins interact with each other has been studied extensively using a wide variety 
of approaches (de Juan et al., 2013; Hosur et al., 2012; Zhang et al., 2012; Shoemaker and 
Panchenko, 2007, Valencia and Pazos, 2002, Ochoa and Pazos, 2010). Amino acid residue revo- 
lution has been used to predict residue-residue interactions across interfaces with local statistical 
models {Pazos et al., 1997; Halperin et al., 2006). As noted above, the accuracy of these models is 
reduced by the confounding of direct and indirect correlations {Lapedes et al., 1999; Weigt et al., 
2009); the application of global statistical models to coevolution-based contact prediction across 
interfaces has been limited to the case of the histidine-kinase/response-regulator two component 
system {Burger and van Nimwegen, 2008; Weigt et al., 2009; Schug et al., 2009; Dago et al., 2012). 
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eLife digest Protei ns are considered the 'workhorse molecules' of life and they are involved in 
virtually everything that cells do. Proteins are strings of amino acids that have folded into a specific 
three-dimensional shape. Proteins must have the correct shape to function properly, as they often 
work by binding to other proteins or molecules — much like a key fitting into a lock. Working out the 
structure of a protein can, therefore, provide major insights into how the protein does its job. 

Two or more proteins can bind together and form a complex to perform various tasks; and 
solving the structures of these complexes can be challenging, even if the structures of the protein 
subunits are known. Now, Ovchinnikov, Kamisetty, and Baker have developed a method for 
predicting which parts of the proteins make contact with each other in a two-protein complex. 

Different species can have copies of the same proteins; but a copy from one species might have 
different amino acids at certain positions when compared to a related copy from another species. 
As such, when pairs of interacting proteins from different species are compared, there will be many 
positions in the two proteins that vary. However, if the amino acid at a position in one protein (let's 
call it 'X') varies, and the amino acid at, say, position 'Y' in the other protein also varies such that for 
any given amino acid at position Y there is often a specific amino acid at position X; positions X and 
Y are said to 'co-vary'. Ovchinnikov et al. noticed that when a pair of amino acids (one from each 
protein in a two-protein complex) co-varied, these two amino acids tended to make contact with 
each other at the protein-protein interface. 

Ovchinnikov et al. used the new method to make predictions about the protein-protein 
interfaces in 28 protein complexes found in bacteria, and also to make a prediction about the 
interface between protein subunits in the bacterial ribosome. When these predictions were checked 
against the actual structures, which were all known beforehand, they were found to be accurate if 
the number of copies of each protein being compared is greater than the average length of the two 
proteins. 

Ovchinnikov et al. went on to predict the amino acids on the protein-protein interfaces for 
another 36 bacterial protein complexes with unknown structures, and to present models for four 
larger complexes. The next challenge is to extend the method to protein complexes that are found 
only in eukaryotes (i.e., not in bacteria). Since the number of related copies for eukaryotic proteins 
tends to be smaller, there are fewer proteins to compare and it is therefore harder to detect 
'covariation' when it occurs. 
DOI: 10.7554/el_ife.02030.002 



In this study, we examine residue-residue covariation across protein-protein interfaces using a 
pseudo-likelihood-based statistical method. In a large set of complexes of known structure, we find 
that covarying pairs of positions are almost always in contact in the three-dimensional structure, pro- 
vided there are sufficient aligned sequences. We find further that significant residue-residue covari- 
ance occurs frequently between physically interacting protein pairs but very rarely between 
non-interacting pairs, and hence should be useful for predicting whether two proteins interact. We 
use the pseudo-likelihood method to predict contacts across protein-interfaces for 36 evolutionarily 
conserved complexes of unknown structure and present structure models for four of the complexes 
particularly well constrained by these data. 

Results 

For a single protein family, it is straightforward to generate a multiple sequence alignment and subse- 
quently identify covarying residue pairs. To identify covarying residue pairs between two proteins A 
and B is not as easy: only organisms that contain an ortholog of protein A and protein B contribute, 
and in generating the alignments the protein A and protein B sequences for each organism must 
be properly paired. To simplify the ortholog identification problem, we focus on pairs of genes with 
conserved chromosomal locations separated in the genome by fewer than 20 other annotated genes. 
We then build GREMLIN global statistical models for sequences in the paired protein families. The 
models have 'one-body' parameters for each amino acid at each position in the two proteins, and 
'two-body' parameters for each pair of amino acids at each pair of positions in the two proteins. 
These parameters are obtained by maximizing the pseudo-likelihood of the observed sequence pairs, 



Ovchinnikov et al. eLife 2014;3:e02030. DOI: 10.7554/eLife.02030 



2 of 21 



Research article Biophysics and structural biology | Genomics and evolutionary biology 

rather than their likelihood, which makes the quite formidable estimation tractable. In the following 
sections, we investigate the structural contexts of residue pairs with large values of these two-body 
coupling parameters 

Residue-residue covariation in the bacterial 50S ribosomal unit 

We began by studying residue-residue coupling parameters in the bacterial 50S ribosomal subunit — 
the largest evolutionarily conserved bacterial multiprotein complex with an atomic resolution struc- 
ture. For each individual protein in the complex, we constructed multiple sequence alignments by 
querying the UniProt sequence database {Wu et a/., 2006) for homologous sequences. For every 
pair of proteins in the complex, we then constructed a paired multiple sequence alignment ('Materials 
and methods'). For each such paired alignment, we built a GREMLIN global statistical model, com- 
puted normalized coupling strengths from the two body coupling parameters, and ranked inter pro- 
tein residue pairs based on these scores ('Materials and methods'). A coupling strength larger than 
one indicates higher than average coupling between two residues. 

We find that in the 50S ribosomal subunit only a small fraction of residue pairs coevolve, as indi- 
cated by coupling strengths (y axis of Figure 1A) greater than 1.5. Remarkably, the two residues in 
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Figure 1. Residue pairs with high normalized coupling strengths are in contact in the 50S ribosomal subunit. (A) Coupling strengths and inter-residue 
distances for each residue pair in the 50S subunit (black dots). Residue pairs with coupling strength greater than 1 .5 are nearly always less than 8 A apart. 

(B) Locations of coevolving (high coupling strength) residue pairs in the protein component of the 50S subunit. The monomers have been pulled apart 
slightly for clarity. Lines connect residue pairs with coupling strength greater than 1 .5; yellow, distance less than 8 A; orange, distance less than 12 A. 

(C) Protein pairs with strong inter-residue covariation (colors) make contact in the three-dimensional structure (black boxes). For each protein pair, the 
sum of the coupling strength greater than 1 .5 for each pair of 50S subunit proteins is indicated; black boxes indicate contacts in the crystal structure. 

(D) Dependence of contact prediction accuracy on coupling strength and the number of sequences in the alignments. For each of the indicated coupling 
strength cutoffs (colors), the frequency of contact in the 50S structure (y axis) was computed for sub alignments with different sequence depths (x axis). 
DOI: 10.7554/eLife.02030.003 

The following figure supplements are available for figure 1 : 

Figure supplement 1. Determining GREMLIN scores from normalized coupling strengths. 

DOI: 10.7554/eLife.02030.004 
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each of these pairs are almost all within 8 A of each other in the 50S crystal structure {Figure 1A) and 
all are within 12 A. The locations of the covarying residue pairs in the 50S structure (with the individual 
proteins pulled apart for clarity) are shown in Figure 1B; yellow lines indicate distances less than 8 A 
and orange lines, distances less than 12 A. For the 50S ribosome, the GREMLIN model was built using 
sequence data from ~1 500 non-redundant genomes; Figure 1D suggests that for complexes with such 
large numbers of aligned sequence, residue-residue interactions across interfaces can be predicted 
with quite high confidence based on amino acid sequence covariation. 

For a large protein-protein complex, can the sum of the coupling strengths between pairs of pro- 
teins in the complex be used to distinguish directly interacting and non-interacting protein pairs? In 
the 50S subunit, every pair of proteins with summed coupling strengths (numbers in Figure 1Q greater 
than 1.5 interacts with each other (boxes in Figure 1Q. There are, however, several instances of pro- 
tein pairs that contact in the 50S subunit for which no covariance is observed; clearly not every inter- 
action will be identified by the sum of the coupling strengths, for example between two proteins that 
are held together primarily by the ribosomal RNA. 

How many aligned sequences are required for accurate contact prediction? To assess the depend- 
ence on alignment depth, we generated paired sub-alignments with varying numbers of sequences for 
every pair of 50S proteins and recomputed coupling strengths for each sub-alignment. For each align- 
ment depth, we calculated the fraction of residue pairs within 12 A for different ranges of coupling 
strengths. We find that the greater the number of aligned sequences, the lower the value of the cou- 
pling strength above which residue pairs are likely to be in contact in the structure (Figure 1D). For 
example, if the number of aligned sequences is greater than the sum of the lengths of the two pro- 
teins, residue-residue contact predictions are likely to be accurate if the coupling strength is 2 or 
greater (Figure 1D: orange dots), while if there are twice as many sequences, contact predictions are 
accurate above a coupling strength of 1 .5 (the cutoff shown in Figure 1A). A sigmoidal function of the 
coupling strength and the number of sequences per position in the complex accurately fits the 
observed contact frequency data ('Materials and methods' and Figure 1 — figure supplement 7); we 
refer to the fitted values as GREMLIN scores for the remainder of the paper. 

Bacterial complex benchmark 

We next generated paired-alignments for all E. coli gene-pairs that had conserved intergenic distances 
across genomes deposited in the UniProt ('Materials and methods'). As the 50S results (Figure 1D) 
suggested that alignment depths greater than the average of the lengths of the two proteins were 
required for accurate prediction, we focused on paired alignments with at least this number of 
sequences — 1126 gene pairs in total excluding the ribosomal proteins. For each of these 1126 pairs, 
we generated GREMLIN global statistical models and determined the coupling strength for each res- 
idue pair. 

For 64 of the 1 126 gene pairs, at least one pair of residues had GREMLIN score >0.85. For 28 of the 
64 pairs three-dimensional structures have been determined experimentally, and the locations of the 
residue pairs with GREMLIN score >0.6 for several of these complexes are shown in Figure 2A (pairs 
within 8 A are in yellow, between 8 A and 12 A in orange, and greater than 12 A, in red). Almost all 
pairs with GREMLIN scores greater than 0.6 are in contact in the complex structures, with the notable 
exception of the NADH dehydrogenase subunits (Figure 2B). The complex is thought to undergo 
a cascade of conformational changes during electron transfer (Baradaran et a/., 2013); the high 
GREMLIN score contacts not made in the solved structure may provide insight into the nature of these 
changes. As observed for the 50S complex (Figure 1Q, the existence of one or more high GREMLIN 
scores between two proteins provides evidence that the proteins interact: 44% (28/64) of the protein 
pairs with high GREMLIN scores form a complex which has been solved crystallographically compared 
to 8% (78/1 1 26) over the whole set. 

Contact predictions for complexes of unknown structure 

The results with the 50S ribosome and the protein pairs in the benchmark suggest that interactions 
can be accurately predicted across protein-protein interfaces given a sufficient number of aligned 
sequences. In Figure 3, we provide residue-residue contact predictions for the 36 of the 64 com- 
plexes with currently unknown structure (the E. coli gene sequences were clustered, and hence each 
complex may represent multiple E. coli gene pairs). These predictions should contribute to the deter- 
mination of the structures of these biologically important complexes. 
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Figure 2. Residue covariation in complexes with known structures. (A) Residue-pairs across protein chains with high 
GREMLIN scores almost always make contact across protein interfaces in experimentally determined complex 
structures. All contacts with GREMLIN scores greater than 0.6 are shown; the structures are pulled apart for clarity. 
Labels are according to chains in the PDB structure. (B) Complex I of the electron transport chain has an unusually 
large number of highly co-varying inter residue pairs not in contact in the crystal structure of 4HEA; these contacts 
may be formed in different state of the complex. Residue pairs within 8 A are in yellow, between 8 A and 12 A in orange, 
and greater than 12 A, in red. Distances are the minimal distances between any side chain heavy atom. Labels are 
according to chains in 4HEA. (C) Dependence of inter-residue distance distributions on GREMLIN score. All residue- 
residue pairs between subunits in the benchmark set were grouped into four bins based on their GREMLIN score 
(colors), and the distribution of residue-residue distances (x axis) within each bin computed from the three- 
dimensional structures. See Figure 2 — source data 1 for the table of all the interfaces used in the calculation. 
DOI: 10.7554/eLife.02030.005 

The following source data are available for figure 2: 
Source data 1. PDB benchmark set. 

DOI: 10.7554/eLife.02030.006 



From contacts to structural models 

Are the predicted contacts useful in assembling models of the protein complex from models of 
each component? We evaluated this on a docking test set containing 18 protein complexes from the 
benchmark set where at least one component (or a close homolog) had a known structure in the 
apo form ('Materials and methods', docking test-set). We developed a docking protocol that used 
the predicted contacts as distance restraints and sampled the space of physically plausible 
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structures to generate models of the protein-protein complex. The model with the best restraint 
score had an interface that was within 4 A (in root mean square deviation) of the native interface 
in 1 4 of the 18 cases and had more than half the native contacts in 1 6 of the 1 8 cases {Figure 4A, 
Figure 4 — figure supplement 1). Two of the cases in which the iRMSD (interface root-mean-square 
deviation) was the highest (bottom of table in A) are illustrated in Figure 4B-C: the high iRMSD is due 
to large changes in the conformation of one of the monomers upon binding; despite these changes 
the binding interface is reasonably accurately identified. Conformational changes that hinder the rigid- 
body docking protocol from sampling the bound conformation also occurred for thiazole synthase/ 
sulfur carrier and phenylalanyl-tRNA synthase with iRMSD of 4.8A and 4.3A, respectively. In Figure 4D, 
a second energy minimum corresponds to a second interface in the complex with a different homo- 
oligomer subunit. In the absence of conformational changes, predicted contact guided docking is very 
accurate. The same protocol, on a positive control set of known bound structures of 41 protein-pairs 
(including 15 protein-pairs from the NADH electron transport complex), generated models that were 
within 2 A of the native complex structure in 38 cases and within 4 A in all but one case {Figure 4 — 
source data 1, Figure 4 — figure supplement 2). 

Taken together, these results suggest that in cases with small conformational change, the docking 
protocol can recover the entire interface to high accuracy and in cases where binding is accompanied by 
a large conformational change, the protocol recovers the largest intact and/or unobstructed interface. 

Of the complexes with unknown structure listed in Figure 3, we selected four cases with two or 
more high GREMLIN score (>0.6) contact predictions across the interface that had experimentally 
determined structures for most of the subunits ('Materials and methods') and generated structural 
models of the complexes. These models provide the basis for formulating hypotheses about the struc- 
ture/function of the complex, but we emphasize they are not experimentally determined structures; in 
particular the assumption in the modeling procedure that there are not large backbone rearrange- 
ments could be incorrect — in such cases the overall organization of the complex is still likely to be 
correct but the details of the interfaces could be considerably in error. 

The TRAP complex 

The tripartite ATP-independent periplasmic (TRAP) transporters are composed of three proteins: two 
integral membrane proteins YIAM and YIAN, and one periplasmic protein YIAO {Mulligan et al., 
2011). The structure of the periplasmic domain is known, but the membrane portion is unknown. To 
generate a model of the three-dimensional structure of the complex, we built YIAM models using 
Rosetta de novo structure prediction (Simons et al., 1999; Raman et al., 2009) guided by the intra- 
monomer predicted contacts, and models for YIAN and YIAO using RosettaCM comparative mod- 
eling. For YIAN the homologous structure of 4f35 {Mancusso et al., 2012) was used. The three 
monomer structure models were then assembled using PatchDock {Duhovny et al., 2002) and 
RosettaRelax {Conway et al., 2014) guided by the predicted intersubunit contacts ('Materials and 
methods'). In the resultant model of the complex {Figure 5), YIAO interacts with both of the mem- 
brane components; this is supported by a number of intersubunit contacts (yellow lines). 

Tripartite efflux system 

Tripartite efflux complexes span both the inner and outer membrane, and are widely used in bacteria to 
pump toxic compounds out of the cell. The mode of interactions between the outer membrane factor 
and the membrane fusion protein is unresolved, with reports suggesting either a tip-to-tip interaction, 
the insertion of one into the other, or a multistage interaction with an initial tip-to-tip interaction, fol- 
lowed by sliding one through the channel of the other {Long et al., 2012). We generated homology 
models for the subunits based on the alignments to 1yc9 {Federici et al., 2005) and 3fpp {Yum et al., 
2009) and docked them to generate models of the multidrug resistance protein complex. The predicted 
residue-residue contacts for this family of complexes support the tip-to-tip interaction {Figure 4; yellow 
lines); the coevolution data did not provide any evidence to support the insertion model. 

Pyruvate formate lyase-activating enzyme complex 

Pyruvate formate-lyase (PFL) catalyzes the reaction of acetyl-CoA and formate from pyruvate and CoA 
in the Fermentation pathway. Formate acetyltransferase 1 or Pyruvate formate-lyase 1 (PFLB) is acti- 
vated by Pyruvate formate-lyase 1 -activating enzyme (PFLA). The structure of the complex is unknown, 
but the structures of the individual proteins have been solved (PDB ids: 3c8f [Becker and Kabsch, 
2002] and 1h16 [Vey et al., 2008]). We carried out rigid body docking calculations with these two 



Ovchinnikov et al. eLife 2014;3:e02030. DOI: 10.7554/eLife.02030 



6 of 21 



Research article 



Biophysics and structural biology | Genomics and evolutionary biology 



YIAM_YIAN (3.9L) 



21 F 


246 F 


1.00 


91_l 


25_L 


1.00 


42 R 


52 D 


0.97 


14 A 


268 L 


0.93 


87 L 


26 L 


0.89 


59 D 


273 K 


0.89 


17 S 


245 A 


0.87 


28Y 


234_L 


0.84 


43 Y 


48 V 


0.82 


10 A 


268 L 


0.79 


64 Q 


17 G 


0.76 


56 A 


276 S 


0.75 


9 L 


272 A 


0.75 


46 V 


52 D 


0.72 


13 L 


271 A 


0.67 


50 F 


51 A 


0.66 


95 L 


47 L 


0.62 


46 V 


51 A 


0.60 


10 A 


269 I 


0.60 


YIAN_YIAO (3.6L) 




168 S 


196 Y 


1 .00 


296 E 


62 K 


0.89 


166 S 


221 E 


0.72 


237 P 


197 T 


0.59 


YIAM 


VIAO (5.0L) 




33 S 


186 N 


0.56 


PFLAPFLB (0.7L) 




66 F 


650 T 


0.92 


59 E 


643 K 


0.67 


62 T 


570 N 


0.54 


70 S 


688 G 


0.27 


MDTP_MDTN (2.4L) 


431 A 


146 E 


1.00 


223 S 


146 E 


0.96 


227 H 


142 P 


0.90 


231 A 


150 S 


0.74 


431 A 


148 F 


0.45 


435 R 


142 P 


0.44 


METI_METQ (2.0L) 




80 R 


193 D 


0.99 


80 R 


192 D 


0.91 


87 I 


182 L 


0.70 


87 I 


66 F 


0.56 


174 Q 


67 N 


0.50 


87_l 


75 A 


0.43 


UMUC 


JJMUD (1.0L) 


415_S 


38_l 


1.00 


421_V 


54_V 


0.86 


132_H 


76_S 


0.85 


404_R 


33_Y 


0.72 


183_D 


127_V 


0.66 


78_A 


34_V 


0.60 



IF1_SECY (1.4L) 

32 V 343 A 0.89 



YOHJ_YOHK(1.9L) 



64 C 


75 Y 


1.00 


61_N 


79_E 


1.00 


115 S 


94 I 


1.00 


79 G 


38 M 


1.00 


110 V 


101 V 


0.99 


106 V 


101 V 


0.99 


99 S 


115 A 


0.97 


83_M 


42_l 


0.96 


92 Q 


116 S 


0.93 


104 T 


184 A 


0.89 


105_L 


161_F 


0.84 


40 I 


146 I 


0.84 


71 M 


34 L 


0.83 


65_Y 


79_E 


0.81 


113 W 


169 M 


0.80 


16_V 


14J 


0.78 


74L 


65_L 


0.78 


82 V 


42 I 


0.76 


101 A 


157 L 


0.75 


39 S 


70 V 


0.72 


115 S 


176 A 


0.70 


106 V 


105 T 


0.68 


115 S 


179 L 


0.66 


100 C 


150 C 


0.65 


18 I 


72 A 


0.64 


109 L 


169 M 


0.63 


110 V 


97 I 


0.60 


110 V 


213 I 


0.60 


CYOC_NUOK (3.8L) 


49 V 


80 L 


0.96 


83 Y 


29 L 


0.68 


75 L 


73 A 


0.68 


65 E 


89 N 


0.65 


101 W 


29 L 


0.61 


42 I 


62 Y 


0.60 


ATP6_ATPF (2.3L) 




74 K 


34 E 


1.00 


149 V 


10 Q 


0.93 


77 T 


33 I 


0.92 


53 G 


13 A 


0.86 


155 L 


11 A 


0.78 


50 V 


13 A 


0.72 


255 I 


139 S 


0.68 


243 I 


20 F 


0.64 


49 S 


10 Q 


0.64 


263 Y 


21 C 


0.64 


239 V 


16 L 


0.63 


111 W 


10 Q 


0.62 


FECI_FECR (3.3L) 




162 E 


14 R 


0.96 


133 Q 


57 R 


0.95 


158 A 


18 H 


0.61 


FTSI_FTSW (2.0L) 




39_L 


313_V 


0.98 


42_V 


312_V 


0.94 


47_V 


309_Y 


0.80 


43_A 


309_Y 


0.74 


34_A 


357_L 


0.62 



FLGB_FLGC (2.4L) 



34 D 


13 A 


1.00 


34_D 


107_S 


0.99 


110 A 


117 E 


0.98 


98 N 


11 G 


0.98 


121 S 


129 T 


0.96 


34 D 


103 V 


0.96 


113 S 


108 A 


0.89 


27 A 


114 A 


0.88 


117 Q 


128 K 


0 80 


111 D 


122 V 


0.78 


113 S 


125 M 


0.71 


27 A 


27 N 


0.63 


131 M 


69 E 


0.61 


23 Q 


37 P 


0.60 


93 P 


45 K 


0.60 


FLIP FLIQ (2.2L) 




185 I 


84 L 


0.99 


54 I 


55 I 


0.90 


229 V 


77 V 


0.89 


203 V 


24 L 


0.68 


205 M 


66 G 


0.67 


213 P 


55 I 


0.66 


243 F 


72 L 


0.65 


149 L 


67 P 


0.64 


188_T 


80_L 


0.63 


FLGH 


FLGI (0.6L) 




52 F 


133 V 


0.99 


82 L 


257 S 


0.59 


FLHB FLIR (1.1L) 




181 A 


188 L 


0.98 


185_A 


192_T 


0.64 


FLIP_FLIR (1.5L) 




78 L 


187 A 


0.90 


97_T 


13_L 


0.71 


MLAD_MLAE (1.4L) 


14 L 


13 G 


0.93 


14 L 


24 G 


0.76 


8 I 


6 L 


0.65 


58 V 


6 L 


0.65 


65 V 


219 F 


0.63 


CCMC_CCME (0.8L) 


49_Q 


104_R 


0.95 


CCMA_CCMB (1.3L) 


95_E 


16_R 


0.87 


MREB 


.NIFU (0.6L) 




21_N 


57_R 


0.97 


149_l 


108_l 


0.72 


QMCA_YBBJ (1 5L) 


145_E 


115_R 


0.91 


36_R 


113_H 


0.72 



PTPC1 


_PTPD (1.2L) 


130_K 


58 K 


1.00 


210 L 


189 I 


1.00 


34 I 


122 C 


0.98 


126 D 


28 R 


0.98 


232 G 


180 L 


0.93 


123 T 


27 E 


0.86 


180 H 


258 S 


0.85 


125 A 


32 G 


0.85 


134 T 


53 L 


0.72 


125 A 


61 L 


0.66 


104 S 


71 V 


0.64 


12 L 


223 I 


0.60 


YADG_YADH (5.4L) 


97 Q 


22 R 


0.94 


49 G 


93 E 


0.93 


106 Y 


99 P 


0.84 


90 N 


15 K 


0.70 


97 Q 


18 H 


0.68 


82 L 


97 V 


0.67 


104 G 


14 A 


0.64 


RBSA_RBSC (5.6L) 


365 S 


188 Y 


1.00 


112 K 


188 Y 


0.97 


301 A 


211 L 


0.95 


93 Q 


204 G 


0.79 


371 E 


197 R 


0.66 


105 E 


198 Y 


0.64 


APAG_PPIC (0.8L) 




68_V 


73_E 


0.97 


DDPB_ 


DDPC (8.7L) 


148_W 


268_G 


1.00 


279_L 


231_T 


1.00 


20_G 


135_l 


0.99 


327_Y 


180_L 


0.97 


327_Y 


177_G 


0.93 


233_E 


292_K 


0.91 


221_R 


283_D 


0.91 


229_E 


212_P 


0.83 


28_l 


147_A 


0.77 


222_Q 


217_Q 


0.75 


13_G 


127_A 


0.73 


268_L 


137_L 


0.67 


126_R 


293_A 


0.65 


143_S 


279_N 


0.65 


140_T 


272_L 


0.63 


24_l 


135_l 


0.62 


233_E 


185_Y 


0.59 


ENGB NDK (1.0L) 




193_L 


60_F 


0.87 


CLPP_CLPX (1.3L) 


96 F 


271 G 


0.95 


40_E 


267_G 


0.72 



APPB_APPC (1.2L) 



91 I 


474 S 


1.00 


101 P 


68 A 


0.99 


95 C 


481 L 


0.95 


88 V 


473 F 


0.94 


95 C 


477 M 


0.92 


342 I 


98 M 


0.90 


91 I 


477 M 


0.77 


332 S 


90 D 


0.70 


RNFE RNFG (1.8L) 


188_L 


160_V 


0.90 


85_M 


27_N 


0.90 


67_l 


12_L 


0.89 


146_G 


171_T 


0.66 


NRFCNRFD (1.2L) 


130_Y 


92_S 


0.93 


129_Q 


76_T 


0.79 


124_L 


87_H 


0.69 


154_F 


55_T 


0.63 


TOLQ TOLR (4.9L) 


149_l 


27_V 


0.91 


177_A 


27_V 


0.91 


177_A 


25_L 


0.90 


177_A 


26_L 


0.84 


139_Y 


19_V 


0.83 


142_L 


20_P 


0.77 


179_A 


24_V 


0.76 


167_V 


27_V 


0.76 


179_A 


26_L 


0.73 


177_A 


18J 


0.69 


133_V 


102_N 


0.67 


177_A 


19_V 


0.67 


156_L 


19_V 


0.66 


142_L 


22_L 


0.66 


200_N 


11_D 


0.63 


134_G 


19_V 


0.59 


ATKA_ATKC (0.7L) 


190 I 


30 G 


1.00 


448 P 


91 N 


1.00 


249 A 


175 V 


0.99 


186 A 


26 T 


0.99 


183 L 


26 T 


0.84 


249 A 


167 L 


0.79 


11 T 


170 Y 


0.78 


198 F 


24 L 


0.63 


238 A 


83 G 


0.62 


244 F 


55 I 


0.60 


FEOA FEOB (0.9L) 


41_R 


378_F 


0.94 


YEJA YEJB (3.1 L) 




466_D 


325_R 


0.90 



Figure 3. Predicted residue-residue interactions across protein interfaces of unknown structure. Strongly co-evolving residue pairs for complexes without 
known structure that had at least one prediction with GREMLIN score greater than or equal to 0.85. Each row shows the residue pairs, their sequence identity 
and the GREMLIN score. Structure models for complexes highlighted in red are shown in Figure 5. Full dataset is provided with the deposited data. 

DOI: 10.7554/eLife.02030.007 



proteins guided by GREMLIN predictions. Interestingly, the region that undergoes conformational 
change in the activating enzyme upon substrate binding (3c8f -> 3cb8 [Becker and Kabsch, 2002]} is 
in the region we predict to be in contact with PFL. 
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Figure 4. Contact guided protein-protein docking on a benchmark set of 18 protein complexes. (A) Structure 
models for each complex were generated by docking structures of its constituents, at least one of which (blue) was 
not from the structure of the complex guided by coevolution derived distance restraints. The interface C-alpha 
RMSD (iRMSD) of the structural model with the lowest energy to the experimentally determined structure and the 
fraction of native contacts are shown. Structure models for cases in red are shown in B and C and D. (B and C) 
Comparison between native and docked structure for the two largest failures in the benchmark: the large iRMSD is 
due to large conformational changes in the monomers upon docking but the interface is still modeled correctly in 
Figure 4. Continued on next page 
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Figure 4. Continued 

the region not involved in conformational change. (D) Multiple minima in the docking landscape (right) correspond 
to distinct interfaces in the complex (left). 

DOI: 10.7554/el_ife.02030.008 

The following source data and figure supplements are available for figure 4: 
Source data 1. Bound set. 

DOI: 10.7554/el_ife.02030.009 

Figure supplement 1. Docking landscapes showing iRMSD (x-axis) vs GREMLIN restraint score (y-axis). 
DOI: 10.7554/el_ife.02030.010 
Figure supplement 2. Bound set. 

DOI: 10.7554/el_ife.02030.011 



D-methionine transport system 

D-methionine transporter is an ATP-driven transport system that transports methionine. We docked 
the E. coli structure of METI (3tui, chain A and B, Johnson et al., 2012) with a RosettaCM model of 
METQ based on 3k2d {Yu et al., 2011). The resulting docked model is consistent with the top ranked 
GREMLIN predictions {Figure 5). 

Discussion 

Our results demonstrate unequivocally that there is strong selective pressure at protein-protein inter- 
faces beyond simple residue conservation, and that co-evolving residue pairs are nearly always in 
contact in the protein complex. Not all contacting residues across protein interfaces likely co-evolve 
nor all protein-protein interfaces. Nevertheless, as illustrated in Figures 1 and 2, there is clearly suffi- 
cient coevolutionary signal to significantly constrain models of a large number of protein complexes. 

There is a notable contrast in the utility of intra-monomer and intersubunit predicted contacts for 
structure modeling. We found previously {Kamisetty et a/., 2013) that contacts could be predicted 
with high accuracy for monomeric proteins, provided there were sufficient aligned sequences, but in 
such cases there was almost always already a structure of a family member from which comparative 
models could be built, limiting the utility of the predicted contacts in structure prediction (Though 
predicted contacts can be useful in modeling allosteric changes in protein structures [Hopf et a/., 
2012; Morcos et al., 2013]). In contrast, here we find that more than half of the complexes for which the 
protein families of the constituent subunits are sufficiently large for accurate contact prediction do not 
currently have three-dimensional structures. Hence, while predicted contacts can be very accurate for 
both monomeric globular proteins and for protein-protein complexes, they are more useful for struc- 
ture modeling for the latter due to the much poorer representation of protein complexes in the PDB. 

While our approach of constructing a global statistical model from paired sequence alignments 
is generally applicable to any taxa, the current study focuses on prokaryotes and mitochondria. Doing 
so allows us to largely avoid the problem of distinguishing between paralogs by exploiting the operon 
architecture of bacterial genomes {Jacob et a/., 2005). Constructing paired-sequence alignments 
for more complex genomic architectures is more involved and requires the ability to distinguish 
orthologs from paralogs, the subject of active research {Remm et al., 2001; Datta et a/., 2009). 
Protocols for generating paired sequence alignments more generally are an important area for devel- 
opment in this area. 

Materials and methods 
Individual alignment generation 

Multiple sequence alignments were generated for each of the 4303 E. coli protein genes as identified by 
EcoGene 3.0 {Zhou et al., 2013) using HHblits (-n 8 -e 1 E-20 -maxfilt oo -neffmax 20 -nodiff -realign_max 
oo), and HHfilter (-id 100 -cov 75) in the HHsuite (version: 2.0.15, Remmert et al., 2011). To reduce 
redundancy, we constructed HMMs from each MSA and clustered genes based on the HHA {Kamisetty 
et al., 2013), a measure of HMM-HMM similarity: a pair of genes was assigned to the same cluster if the 
HHA is less than 0.5. This procedure resulted in 2340 non-redundant gene clusters. 

For the benchmark set, a new alignment was generated using the sequence associated with each 
PDB. For the 50S ribosome and NADH dehydrogenase, we used Thermus thermophilus HB8 sequences 
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Figure 5. Structure models for complexes with unknown structures. Residue pairs with GREMLIN scores 2 0.60 are connected by yellow bars; the 
structures are pulled apart for clarity. For METQ-METI and PFLA-PFLB GREMLIN scores 2 0.3 are shown. For each docking calculation the docking 
energy landscape is shown, with iRMSD to the selected model on the x-axis. The multiple minima correspond to permutations of the labels on the 
subunits of the homo-oligomer complex. Predicted structures of each complex are provided with the deposited data. 

DOI: 10.7554/eLife.02030.012 



from PDB structures 3uxr (Bulkley et al., 2012) and 4hea {Baradaran et al., 2013) respectively. For 
paralogous NADH dehydrogenase chains L, M, and N, we used an e-value of 1E-60 in the alignment 
generation protocol. In addition to complexes from the E. coli analysis, we also include the GatCAB 
amidotransferase complex in our benchmark set, using sequences from the PDB structure 3ip4 
(Nakamura et al., 2010). For cases where the PDB sequence length was much longer than average 
coverage, we modified the coverage filter to 50% of query. The sequences were then realigned using 
clustal omega v1.2 (—iterations 2 — full-iter) (S/evers et al., 2011). Residues not present in the query 
sequence were dropped from subsequent analysis. 

Paired alignment generation 

We construct alignments of paired protein sequences [x 1f x 2 , x p ; x p+1 , x p+q ] from the same genome 
with positions 1 :p and p+1 :p+q corresponding to the first and second proteins respectively. We refer 
to such a multiple sequence alignment of paired sequences as a paired alignment. 
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For gene families with a single copy in each genome such as the ribosomal proteins, constructing 
paired alignments is straightforward as sequence pairs from the same genome can simply be concat- 
enated. While the process of generating paired alignments in general is complicated in the presence 
of multiple paralogs of a gene in a single genome, in prokaryotes, co-regulated genes are often 
co-located on the genome into operons. We exploit this property to avoid paralogous genes when 
creating paired sequences by restricting to gene pairs that have small, conserved intergenic distances. 
A similar approach was used to construct a database of fusion proteins in prokaryotic genomes (Suhre 
and Claverie, 2004). Defining Agene as the number of annotated genes between a gene pair, we only 
consider pairs with Agene conserved in 60% of genomes and less than 20. To allow for ambiguity in 
annotation, if the second or third most common intergenic distance is within 1 of the mode, these 
gene-pairs are included in the conservation calculation. Given that most UniProt accession IDs are 
serially assigned in a genome (UniProt Accession), Agene can be rapidly evaluated by looking at 
the difference in accession ids. The paired alignment is then filtered to reduce redundancy to 90% 
sequence identity and to remove positions that have more than 75% gaps. 

Identification of protein complex structures 

To identify protein pairs in the same complex structure, a HMM was constructed for each E. coll pro- 
tein using hmmbuild from the already generated HHblits alignments. We then used hmmsearch to 
scan PDB sequences in the S2C database (Wang et al.; Both hmmbuild and hmmsearch are part of the 
HMMER v3.1b package [Eddy, 2009]). Only hits with e-value less than 1E-10 were considered. Protein 
pairs found in the same complex structure (PDB file) were considered to be in contact if a Ca atom in 
one structure was within 12 Angstroms of a Ca atom in the other. 

Gremlin model construction from paired alignments 

GREMLIN constructs a global statistical model of the paired alignment, assigning a probability to 
every amino-acid sequence in the paired alignment: 

p(X, , X 2 . . . , X p ; X p+1 . . . X p+q ) = 1 exp(ip [v, (X, ) + IPJ>. . (X, , X,)]) 

where, the Vi are vectors encoding position-specific amino-acid propensities and the Wg are matrices 
encoding amino-acid coupling between positions i and j. These parameters are obtained from the 
aligned sequences by maximizing the regularized pseudo-likelihood (Balakrishnan et al., 2011) of the 
alignment as described in [Kamisetty et al., 2013): 

v,w = arg max I^I^log P(X, | X,..X M X i+1 ..X p+q ) + R(v, w) 

where, each term in the summation is a conditional distribution capturing the probability of a particular 
amino-acid at a position in the context of the entire protein sequence and R(v,w) is a regularization 
term to prevent over-fitting. 

Previous approaches (Morcos et al., 2011; Jones et al., 2012) estimated v, w using an approximate 
moment matching approach (Kamisetty et al., 2013) by inverting a generalized covariance matrix. 
These rely on a Gaussian-like approximation to the global partition function. Unlike these approaches, 
estimation via the pseudo-likelihood avoids this approximation relying instead on local partition func- 
tions (Balakrishnan et al., 2011; Ekeberg et al., 2013; Kamisetty et al., 2013). The resulting global 
optimization problem can be efficiently solved using standard convex optimization techniques and 
provides estimates for each vector Vj and matrix Wg (Kamisetty et al., 2013). 

Ranking residue pairs with gremlin scores 

To reduce the Wg matrices to single values reflecting the strength of the coupling between positions 
i and j, we first compute Sg, their vector 2-norm (the square root of the averages of the squares of the 
individual matrix elements). We correct for differences in Sg due to sequence variability at different 
positions using the row and column averages of these values: 

corr _ ik k J kj k 

ij ij <S > 

v \l ^kl 

where brackets indicate averages taken over the indices outside the brackets in a manner similar to 
that of Average Product Correction (APC, Dunn et al., 2008). Unlike the APC, we account for 
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differences in the rates of evolution in the two protein families by computing the averages only over 
the positions of the proteins corresponding to positions i and j: if i and j are both in the first (second) 
protein, the averages are computed over the positions in the first (second) protein; if i is in the first 
protein and j in the second, the column average is computed only over the positions of the first protein 
and the row average, only over the positions of the second protein. We then compute a normalized 
coupling strength, ncsij, by dividing the Sy orr by the average of the top 3L/2 s|j° rr values across the two 
proteins (since there are roughly 3L/2 contacts for a protein of length L [Kamisetty et a/., 2013; SI]). 

As illustrated in Figure ID, the relation between normalized coupling strength and contact fre- 
quency varies with the ratio of the number of aligned sequences to the length of the protein complex. 
We also observed that residues were more frequently in contact for a given coupling strength when 
the top score for that complex was high. To account for these dependencies, we constructed a model 
that estimates the probability of being in contact based on the bacterial 50S ribosomal complex: 

GremlinScore(x,N/L) = 1/(1 + exp(-a(x - |j)) 

where 

u = m N/L+1 +c 

and x is Vncs.j for the top scoring contact in each complex and Vncsy scaled by the Gremlin score of the 
top contact in all other cases. The values of m, c, and a (0.47, 0.96, and 9.77 respectively) were deter- 
mined by a non-linear fit to the observed frequencies in the 50S ribosomal data from Figure ID. This 
function accurately accounts for the observed contact frequencies {Figure 1 — figure supplement 1). 

Conversion of gremlin scores to distance restraints 

We converted coupling strengths into residue-pair specific distance restraints and included them in 
the Rosetta structure prediction program. We use sigmoidal distance restraints of the form: 

. . „ weiqht 

restraint(d) = + intercept (1) 

1 + exp(-slope(d - cutoff)) 

where, d is the distance between the constrained atoms and the weight is proportional to ncsij. The 
restraints were introduced between CP atoms (Ca in the case of glycine) in the reduced-atom represen- 
tation of Rosetta (centroid mode) and as ambiguous distance restraints (Lange et a/., 2012) between 
side-chain heavy atoms (cutoff of 5.5 and slope of 4) in the full-atom stage of Rosetta. For the centroid 
mode, restraints used the amino acid pair specific CP-CP cutoff and slopes, as described in Kamisetty 
et a/., 2013 SI Table III. These distance restraints supplement the Rosetta all atom energy; the combina- 
tion ensures the sampling of physically realistic structures consistent with the contact predictions. 

Comparative modeling 

Comparative models were built using RosettaCM (Song et a/., 2013) based on alignments to homol- 
ogous structures generated using HHsearch {Remmert et a/., 2011). For proteins that had missing 
density in regions predicted to be in contact, we used RosettaCM with co-evolution derived restraints 
to build the missing region before docking. 

De Novo modeling 

The Rosetta ab initio protocol consists of two stages: in the initial stage ('centroid') side-chains are 
represented by fixed center-of-mass atoms allowing for rapid generation and evaluation of various 
protein-like topologies; the second stage ('full-atom') builds in explicit side-chains and carries out all 
atom energy minimization {Simons eta/., 1999; Raman eta/., 2009). YIAM, a membrane protein, was 
modeled with the Rosetta membrane energy function {Yarov-Yarovoy et a/., 2012, Barth et a/., 
2007). Strong repulsive interactions {Equation 1, weight: -100, cutoff: 35, slope: 2 and intercept: 100) 
were added between the center of the extracellular regions and the center of predicted intracel- 
lular regions, and strong attractive restraints (weight:100, cutoff:35, slope:2 and intercept: 0) within 
predicted intracellular regions and extracellular regions, effectively constructing a membrane-like sam- 
pling space. We used the consensus output of MESSA {Cong and Grishin, 2012) to predict trans- 
membrane regions. 100,000 models were generated and 20 models that best fit the restraints 
converged to a single cluster. 
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Docking test set 

Jackhammer (part of HMMER v3.1b package; Eddy, 2009) was used to identify a subset of 18 com- 
plexes in the benchmark set where at least one of the proteins or a close homolog had a solved struc- 
ture of its apo form. In cases where the structure was of a homologous protein (e-value < 1E-20) and 
where most of the interface residues were present, we generated a structural model of the target 
protein using comparative modeling. We only considered cases where at least one of the structures 
was unbound as the bound-bound docking problem is not representative of real world docking chal- 
lenges (Betts and Sternberg, 1999). The positive control shown in Figure 4 — source data 1 was run 
on all protein-pairs from the benchmark set, where at least two predicted inter contacts had a high 
GREMLIN score (>0.6). 

Complex assembly by protein-protein docking 

For each inter restraint pair that is in the top 3/2L predictions, we used PatchDock v1 .0, with clustering 
parameters (rmsd 0.5; discardClustersSmaller 0) {Duhovny et al., 2002) to generate an ensemble of 
conformations that were then scored using all the restraints. For tripartite efflux pump, the surface 
segmentation parameters were further modified (low_patch_thr 0; prune_thr 0.1; flat 1), to allow for 
more diverse interfaces. The top 5 models by restraint score were energy-minimized in cartesian space 
using both inter and intra restraints with cycles of minimization and side chain repacking using Rosetta 
as described in Conway et al. (2014). The best scoring model by restraint score was then selected. 

For fraction of native contact (Fnat) and interface root-mean-squared deviation (iRMSD) calcu- 
lation, the interface residue-residue contacts are those where the minimal distance between any 
heavy side-chain atom is less than 5 A. The Fnat calculation is performed as described in Kamisetty 
et al. (2013) SI Table III. 

All structural figures were drawn with PyMOL (The PyMOL Molecular Graphics System, Version 
1.5.0.4 Schrodinger, LLC). 

Data Availability 

The multiple sequence alignments used in the analysis and the full GREMLIN results for all the calcula- 
tions described in the paper are provided at http://gremlin.bakerlab.org/complexes/ along with a 
web-server for paired-alignment generation, coevolution analysis and contact prediction/Rosetta res- 
traint generation. The paired-alignments along with the PDB coordinates of the predicted structures 
are also available at Dryad: Ovchinnikov et al., 2014. 
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Note added in proof 

Two other studies of protein-coevolution using global statistical models have recently appeared: 
Tamir et al., 2014, and Hopf et al., 2014. These studies provide independent validation of the robust- 
ness of global statistical methods for prediction of protein-protein contacts. 
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Author(s) 


Year 


Dataset title 


Dataset ID and/or URL 


Database, license, 
and accessibility 
information 


Reshetnikova L, Moor N, 
Lavrik O, Vassylyev DG 
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PHENYLALANYLTRNA 
SYNTHETASE 
COM PLEXED WITH 
PHENYLALANINE 


1 B70; http://dx.doi. 
org/10.2210/pdb1b70/pdb 


Publicly available at 
the RCSB Protein 
Data Bank. 


Wilkens S, Capaldi RA 


1998 


SOLUTION STRUCTURE 
OF THE EPSILON 
SUBUNIT OF THE 


1 BSN; http://dx.doi. 
org/10.2210/pdb1bsn/pdb 


Publicly available at 
the RCSB Protein 
Data Bank 



F1-ATPSYNTHASE FROM 
ESCHERICHIA COLI AND 
ORIENTATION OF THE 
SUBUNIT RELATIVE TO 
THE BETA SUBUNITS OF 
THE COMPLEX 



ThodenJB,WesenbergG, 1999 STRUCTURE OF CARBA- 
Raushel FM, Holden HM MOYL PHOSPHATE 

SYNTHETASE 
COM PLEXED WITH THE 
ATP ANALOG AMPPNP 



2000 CRYSTAL STRUCTURE 1 DGJ; http://dx.doi. Publicly available at 

OFTHE ALDEHYDE org/1 0.221 0/pdb1dgj/pdb the RCSB Protein 

OXIDOREDUCTASE Data Bank. 

FROM DESULFOVIBRIO 
DESULFURICANS ATCC 
27774 



Rowland P, Larsen S 
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DIHYDROOROTATE 
DEHYDROGENASE A 
FROM LACTOCOCCUS 
LACTIS 


1DOR; http://dx.doi. 
org/10.2210/pdb1dor/pdb 


Publicly available at 
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Data Bank. 


Roberts DL, Salazar D, 
Fulmer JP, Frerman FE, 
Kim JJ-P 
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ELECTRON TRANSFER 
FLAVOPROTEIN (ETF) 
FROM PARACOCCUS 
DENITRIFICANS 


1EFP; http://dx.doi. 
org/10.2210/pdb1efp/pdb 


Publicly available at 
the RCSB Protein 
Data Bank. 


Rowland P, Norager S, 
Jensen KF, Larsen S 


2001 


CRYSTAL STRUCTURE OF 
LACTOCOCCUS LACTIS 
DIHYDROOROTATE 


1EP3; http://dx.doi. 
org/10.2210/pdb1ep3/pdb 


Publicly available at 
the RCSB Protein 
Data Bank. 



DEHYDROGENASE B. 
DATA COLLECTED 
UNDER CRYOGENIC 
CONDITIONS 



1 BXR; http://dx.doi. Publicly available at 

org/10.2210/pdb1bxr/pdb the RCSB Protein 
Data Bank 



Rebelo JM, Macieira S, 
Dias JM, Huber R, 
Romao MJ 
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Kuramitsu S, Yokoyama S, SUCCINYL-COA Data Bank. 
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Unciuleac M, Warkentin E, 
Page CC, Dutton PL, 
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reductase from Thauera 
aromatica 
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org/1 0.221 0/pdb1 qop/pdb 



1 RM6; http://dx.doi. 
org/1 0.221 0/pdb1 rm6/pdb 



Publicly available at 
the RCSB Protein 
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Publicly available at 
the RCSB Protein 
Data Bank. 
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Dorrestein PC, Zhai H, 
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synthase/ThiS complex 



1TYG; http://dx.doi. 
org/1 0.221 0/pdb1 tyg/pdb 



Publicly available at 
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Initiative (RSGI) 
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Hunt J, Northeast Structural 
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1XM3; http://dx.doi. Publicly available at 
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McKeegan KS, 
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2005 The crystal structure of 
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Publicly available at 
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Data Bank. 
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Initiative (RSGI) 
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Publicly available at 
the RCSB Protein 
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Yamada S, Sugimoto H, 
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